A Word Selection Model Based on Lexical Semantic Knowledge in English Generation

نویسندگان

  • Yi-Dong Chen
  • Tangqiu Li
  • Xu-Ling Zheng
چکیده

Word selection is an vital factor to improve the quality of machine translation. This paper introduces a new model for word selection based on lexical semantic knowledge, which could deal with the problem significantly better. Meanwhile, the construction of the English lexical semantic knowledge base required for the model in our Chinese-English machine translation system is also discussed in detail. 1 Word selection Methods Based on Lexical Semantic Knowledge in Generation The task of Vocabularies Handling in machine translation is to map source language words or phrases to their corresponding ones in target language. The task should be performed in almost every stage of machine translation, since words are basic elements of a sentence. A word in a source language can be translated into many different ones in the corresponding target language, since there exist 1 to N mapping between words in different languages due to the homophony and synonyms. But only one of them should be chosen according to the context. Such work is called Word selection. It is common practice that if one target word is selected improperly during the word selection, the sentence of the translation becomes quite unreadable, or even its meaning is much different from the source sentence. Word selection is regarded as one of the most important and difficult problem in machine translation. (Liu Xiaohu et al., 1998). With the development of machine translation, researchers realized that it is more important to consider its semantic constraints in dealing with the problem of word selection than syntax constraints of each word candidates, and are now paying more and more attention to applying of semantic knowledge in machine translation. The following (in 1.1 and 1.2) are two frequently used methods of this kind. 1.1 Semantic Pattern Based Method In this method, a semantic pattern consists of a headword and its one or more slots of semantic constraints. The semantic pattern base with a great number of such patterns should be constructed first. In word selection, the probability of each candidate can be calculated by comparing the semantic slot constraints of the pattern with the actual semantic environment of a concept, the interlingua structure. The interlingua structure is structurally similar to the pattern but contains the concept to be expressed with proper target word. Finally, one pattern with the highest probability will be chosen as the base of the word selection. This method is usually referred to as Rationalist Method and was first used in DOGENES (Nirenburg et al., 1998) developed in Carnegie Mellon University. There are a few main weak points of this method. First, the pattern base is usually constructed manually, and it is hard to construct a good one without losses. Also, subjective factors will be introduced while constructing such a pattern base. Secondly, the semantic slot constraints in patterns manually made are usually high level concepts, so the variety and particularity in the natural language 1 This paper was supported by the Chinese 863 High Tech Research Fund (2001AA114110), and the Fund of Key Research Project of Fujian Province (2001H023)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Prosody: Its Knowledge and Appropriate Selection of Equivalents

In translation, choosing appropriate equivalent is essential to convey the right message from source-text to target-text, and one of the issues that may have a determinative role in appropriate equivalent choice is the semantic prosody (SP) behavior of words and the relation existing between the SP of a word and semantic senses (i.e. negativity, positivity or neutrality) of its collocations in ...

متن کامل

Semantic Prosody: Its Knowledge and Appropriate Selection of Equivalents

In translation, choosing appropriate equivalent is essential to convey the right message from source-text to target-text, and one of the issues that may have a determinative role in appropriate equivalent choice is the semantic prosody (SP) behavior of words and the relation existing between the SP of a word and semantic senses (i.e. negativity, positivity or neutrality) of its collocations in ...

متن کامل

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

First Language Activation during Second Language Lexical Processing in a Sentential Context

 Lexicalization-patterns, the way words are mapped onto concepts, differ from one language      to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003